Welcome to the first Women’s Data Science Study group.

Today, we had an introdution to R and RStudio and we learned a little bit about R Markdown.

Now, we will create a HTML file with interactive plots using the historical air quality data available at the City of Calgary’s open data portal.

About the dataset

The dataset has daily air quality data accessible for various parameters at Calgary monitoring stations.

A smoky Calgary on Wednesday, Aug. 15, 2018, by Leah Hennel

A smoky Calgary on Wednesday, Aug. 15, 2018, by Leah Hennel

“The Air Quality Health Index (AQHI) is calculated hourly at certain monitoring stations, where the AQHI is a simple way to interpret air quality conditions: it provides a number from 1 to 10+ which indicates the relative health risk associated with local air quality. The higher the number, the greater the health risk.”

The AQHI is calculated based on a mixture of common air pollutants which are known to harm human health.

Pollutants

  • Ground-level Ozone (O3): the main sources are vehicle and industrial emissions in urban centres.

  • Nitrogen Dioxide (NO2): is released by motor vehicle emissions and industrial processes that rely on fossil fuels, and contributes to the formation of the other two pollutants.

  • Fine Particulate Matter (PM2.5): is a mixture of tiny airborne particles that can be inhaled deep into the lungs. These particles can either be emitted directly by vehicles, industrial facilities, natural sources like forest fires, or formed indirectly as a result of chemical reactions among other pollutants.

The information about the pollutants is found at The City of Calgary’s open data portal.

The data used for this R markdown is available here

Figure 1

plot_ly(data_daily, 
        x = ~Date,
        y = ~`Average Daily Value`, 
        type = 'scatter', 
        mode = 'lines') %>%
layout(title = "Daily Air Quality Index") 

From the figure above, in August 2015 and August 2018 the AQHI was greater than 5, indicating moderate to high risk. Those high levels were due to hundreds of B.C. wildfires.

Figure 2

gg <- ggplot(data_daily_station, aes(year,  `Average Daily Value`, color = `Station Name`,frame = month)) +
  geom_point()
ggplotly(gg) %>% 
  highlight("plotly_hover")

This plot presents the Daily Air Quality Index (from Jan 2013 to Nov 2018) by Station and by month. Press Play to see the AQHI’s behavior over time.

Exercise 1: As above, create another plot with the variable month in the x axis and variable year the frame (play). Did you find any new pattern in the data?

Figure 3

gg <- ggplot(data_daily_AQHI_pollutants_wide, aes(`Nitrogen Dioxide`,  `Air Quality Index`, color = factor(year),frame = month)) +
  geom_point() +
  geom_smooth(se = FALSE, method = "lm")
ggplotly(gg) %>% 
  highlight("plotly_hover")

Figure 4

gg <- ggplot(data_daily_AQHI_pollutants_wide, aes( Ozone,  `Air Quality Index`, color = factor(year),frame = month)) +
  geom_point()+
geom_smooth(se = FALSE, method = "lm")
ggplotly(gg) %>% 
  highlight("plotly_hover")

Exercise 2: Create a scatter plot Air Quality Index versus Fine Particulate Matter (PM2.5). Do you see anything difference from the previous 2 scatter plots? If yes, where?

Figure 5

Exercise 3: What is the difference between Figure 3 and Figure 5? Can you think any use for Figure 5 in your daily routine?

Thank You!

I hope you enjoyed this R Markdown demonstration and that you are excited to learn all the amazing things you can create using R.